Creating RSS for News Archives and Beyond
نویسنده
چکیده
RSS or Rich Site Summary is becoming an invaluable format/tool for news feeds. More and more news publishing organizations are realizing its benefits. Content publishers are joining the already heavily crowded RSS club. In the era of information explosion and peer-topeer sharing, RSS is a great format for doing content publishing, archiving, sharing and much more. However, it came late. We realize that this should have started at the same time Internet became popular and news organizations are making their on-line debut. During the last decade, an enormous amount of news articles had already been published, and (at the same time,) improperly archived due to the lack of a flexible and widely accepted format of archival. However, better late than never. As we now explore possibilities of RSS, this is the time to make the transition smooth for old unformatted news articles and make it uniform across all (new and old) news articles. To do that we realized that extracting metadata of old news articles is one of the ways to create their RSS versions. In this paper we talk about our progress in extracting news metadata with the use of support vector classifier and show that an ordering of applying the classifiers is more useful than applying them in random order. We also show preliminary results on applying TIMEX tags to extract news events, which can be very useful to go beyond RSS to create individual event lines instead of taking the whole story under a single timeline.
منابع مشابه
Relational RSS Clustering Techniques
1. INTRODUCTION There has been an explosion in the amount of available news and current event information on the Internet. 24-hours news outlets and independent bloggers alike flood the wires with a constant stream of data. Though an increasing number of people rely on the Internet as their primary source of news and current events, it is becoming increasingly difficult for users to find what t...
متن کاملDeveloping Seamless Discovery of Scholarly and Trade Journal Resources via OAI and RSS
The usefulness of online information such as e-publishing and timely notification on the latest scientific or professional news has been widely accepted. However, access to such valuable information is often limited by lack of mechanisms for interoperability and distributed harvesting of the source databases. Following recent experiments at EEVL (Internet Guide to Engineering, Mathematics and C...
متن کاملELIMINATING REDUNDANT AND LESS-INFORMATIVE RSS NEWS ARTICLES BASED ON WORD SIMILARITY AND A FUZZY EQUIVALENCE RELATION by
ELIMINATING REDUNDANT AND LESS-INFORMATIVE RSS NEWS ARTICLES BASED ON WORD SIMILARITY AND A FUZZY EQUIVALENCE RELATION Ian Garcia Department of Computer Science Master of Science The Internet has marked this era as the information age. There is no precedent in the amazing amount of information, especially network news, that can be accessed by Internet users these days. As a result, the problem ...
متن کاملSynthesizing correlated RSS news articles based on a fuzzy equivalence relation
Tens of thousands of news articles are posted on-line each day, covering topics from politics to science to current events. To better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds to locate articles pertaining to their particula...
متن کاملRelating RSS News/Items
Merging related RSS news (coming from one or different sources) is beneficial for end-users with different backgrounds (journalists, economists, etc.), particularly those accessing similar information. In this paper, we provide a practical approach to both: measure the relatedness, and identify relationships between RSS elements. Our approach is based on the concepts of semantic neighborhood an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006